Goto

Collaborating Authors

 goal pose


Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) International Space Station Astrobee Testing

Chapin, Samantha, Stewart, Kenneth, Leontie, Roxana, Henshaw, Carl Glen

arXiv.org Artificial Intelligence

The US Naval Research Laboratory's (NRL's) Autonomous Planning In-space Assembly Reinforcement-learning free-flYer (APIARY) experiment pioneers the use of reinforcement learning (RL) for control of free-flying robots in the zero-gravity (zero-G) environment of space. On Tuesday, May 27th 2025 the APIARY team conducted the first ever, to our knowledge, RL control of a free-flyer in space using the NASA Astrobee robot on-board the International Space Station (ISS). A robust 6-degrees of freedom (DOF) control policy was trained using an actor-critic Proximal Policy Optimization (PPO) network within the NVIDIA Isaac Lab simulation environment, randomizing over goal poses and mass distributions to enhance robustness. This paper details the simulation testing, ground testing, and flight validation of this experiment. This on-orbit demonstration validates the transformative potential of RL for improving robotic autonomy, enabling rapid development and deployment (in minutes to hours) of tailored behaviors for space exploration, logistics, and real-time mission needs.


Learning to Drive Anywhere with Model-Based Reannotation

Hirose, Noriaki, Ignatova, Lydia, Stachowicz, Kyle, Glossop, Catherine, Levine, Sergey, Shah, Dhruv

arXiv.org Artificial Intelligence

Figure 1: We train a highly generalizable navigation policy that can control robots in a variety of conditions and be deployed zero-shot in new environments across the world. Our proposed method, Model-Based ReAnnotation, enables imitation learning from noisy, passive data, such as low-quality crowd-sourced demonstrations or even videos from the web. Abstract--Developing broadly generalizable visual navigation policies for robots is a significant challenge, primarily constrained by the availability of large-scale, diverse training data. While curated datasets collected by researchers offer high quality, their limited size restricts policy generalization. T o overcome this, we explore leveraging abundant, passively collected data sources, including large volumes of crowd-sourced teleoperation data and unlabeled Y ouT ube videos, despite their potential for lower quality or missing action labels. We propose Model-Based ReAnnotation (MBRA), a framework that utilizes a learned short-horizon, model-based expert model to relabel or generate high-quality actions for these passive datasets. This relabeled data is then distilled into LogoNav, a long-horizon navigation policy conditioned on visual goals or GPS waypoints. We demonstrate that LogoNav, trained using MBRA-processed data, achieves state-of-the-art performance, enabling robust navigation over distances exceeding 300 meters in previously unseen indoor and outdoor environments.


OmniVLA: An Omni-Modal Vision-Language-Action Model for Robot Navigation

Hirose, Noriaki, Glossop, Catherine, Shah, Dhruv, Levine, Sergey

arXiv.org Artificial Intelligence

Figure 1: We train a highly generalizable vision-based navigation policy with flexible conditioning, leveraging over 9,500 hours of data collected across 10 different platforms. Our policy supports diverse goal modalities, including language prompts, goal poses, goal images, and their combinations, and can control a variety of robot platforms. Abstract-- Humans can flexibly interpret and compose different goal specifications, such as language instructions, spatial coordinates, or visual references, when navigating to a destination. In contrast, most existing robotic navigation policies are trained on a single modality, limiting their adaptability to real-world scenarios where different forms of goal specification are natural and complementary. In this work, we present a training framework for robotic foundation models that enables omni-modal goal conditioning for vision-based navigation. Our approach leverages a high-capacity vision-language-action (VLA) backbone and trains with three primary goal modalities: 2D poses, egocentric images, and natural language, as well as their combinations, through a randomized modality fusion strategy. This design not only expands the pool of usable datasets but also encourages the policy to develop richer geometric, semantic, and visual representations. The resulting model, OmniVLA, achieves strong generalization to unseen environments, robustness to scarce modalities, and the ability to follow novel natural language instructions. We demonstrate that OmniVLA outperforms specialist baselines across modalities and offers a flexible foundation for fine-tuning to new modalities and tasks. We believe OmniVLA provides a step toward broadly generalizable and flexible navigation policies, and a scalable path for building omni-modal robotic foundation models.


Multi-step manipulation task and motion planning guided by video demonstration

Zorina, Kateryna, Kovar, David, Fourmy, Mederic, Lamiraux, Florent, Mansard, Nicolas, Carpentier, Justin, Sivic, Josef, Petrik, Vladimir

arXiv.org Artificial Intelligence

--This work aims to leverage instructional video to solve complex multi-step task-and-motion planning tasks in robotics. T owards this goal, we propose an extension of the well-established Rapidly-Exploring Random Tree (RRT) planner, which simultaneously grows multiple trees around grasp and release states extracted from the guiding video. Our key novelty lies in combining contact states and 3D object poses extracted from the guiding video with a traditional planning algorithm that allows us to solve tasks with sequential dependencies, for example, if an object needs to be placed at a specific location to be grasped later . We also investigate the generalization capabilities of our approach to go beyond the scene depicted in the instructional video. T o demonstrate the benefits of the proposed video-guided planning approach, we design a new benchmark with three challenging tasks: (i) 3D re-arrangement of multiple objects between a table and a shelf, (ii) multi-step transfer of an object through a tunnel, and (iii) transferring objects using a tray similar to a waiter transfers dishes. We demonstrate the effectiveness of our planning algorithm on several robots, including the Franka Emika Panda and the KUKA KMR iiwa . For a seamless transfer of the obtained plans to the real robot, we develop a trajectory refinement approach formulated as an optimal control problem (OCP). Traditional robot motion planning algorithms seek a collision-free path from a given starting robot configuration to a given goal robot configuration [1]. Despite the large dimensionality of the configuration space, sampling-based motion planning algorithms [2], [3] have shown to be highly effective for solving complex motion planning problems for robots, ranging from six degrees of freedom (DoFs) for industrial manipulators to tens of DoFs for humanoids [4]. Manipulation task-and-motion planning (T AMP) [5] adds an additional complexity to the problem by including movable objects in the state space. This requires the planner to discover the pick-and-place actions that connect the given start and goal robot configurations, bringing the manipulated objects from their start poses to their goal poses. INRIA, Paris This work is part of the AGIMUS project, funded by the European Union under GA no.101070165. Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission.


Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning

Yan, Qiyang, Ding, Zihan, Zhou, Xin, Spiers, Adam J.

arXiv.org Artificial Intelligence

Dexterous in-hand manipulation (IHM) for arbitrary objects is challenging due to the rich and subtle contact process. Variable-friction manipulation is an alternative approach to dexterity, previously demonstrating robust and versatile 2D IHM capabilities with only two single-joint fingers. However, the hard-coded manipulation methods for variable friction hands are restricted to regular polygon objects and limited target poses, as well as requiring the policy to be tailored for each object. This paper proposes an end-to-end learning-based manipulation method to achieve arbitrary object manipulation for any target pose on real hardware, with minimal engineering efforts and data collection. The method features a diffusion policy-based imitation learning method with co-training from simulation and a small amount of real-world data. With the proposed framework, arbitrary objects including polygons and non-polygons can be precisely manipulated to reach arbitrary goal poses within 2 hours of training on an A100 GPU and only 1 hour of real-world data collection. The precision is higher than previous customized object-specific policies, achieving an average success rate of 71.3% with average pose error being 2.676 mm and 1.902 degrees.


The Sense of Agency in Assistive Robotics Using Shared Autonomy

Collier, Maggie A., Narayan, Rithika, Admoni, Henny

arXiv.org Artificial Intelligence

Sense of agency is one factor that influences people's preferences for robot assistance and a phenomenon from cognitive science that represents the experience of control over one's environment. However, in assistive robotics literature, we often see paradigms that optimize measures like task success and cognitive load, rather than sense of agency. In fact, prior work has found that participants sometimes express a preference for paradigms, such as direct teleoperation, which do not perform well with those other metrics but give more control to the user. In this work, we focus on a subset of assistance paradigms for manipulation called shared autonomy in which the system combines control signals from the user and the automated control. We run a study to evaluate sense of agency and show that higher robot autonomy during assistance leads to improved task performance but a decreased sense of agency, indicating a potential trade-off between task performance and sense of agency. From our findings, we discuss the relation between sense of agency and optimality, and we consider a proxy metric for a component of sense of agency which might enable us to build systems that monitor and maintain sense of agency in real time.


Tabletop Object Rearrangement: Structure, Complexity, and Efficient Combinatorial Search-Based Solutions

Gao, Kai

arXiv.org Artificial Intelligence

This thesis aims to provide a complete structural analysis and efficient algorithmic solutions to tabletop object rearrangement with overhand grasps (TORO). This problem captures a common task that we solve on a daily basis and is essential in enabling truly intelligent robotic manipulation. When rearranging many objects in a confined workspace, on the one hand, action sequencing with the least pick-n-places in TORO is NP-hard[han2018complexity]; on the other hand, temporarily relocating objects to some free space ("buffer poses") may be necessary but highly challenging in a cluttered environment. Focusing on these two challenges, the thesis covers TORO in four different setups, including varied workspace assumptions (with/without external buffers) and manipulator settings (single/dual-arms or a mobile manipulator). The thesis first explores TORO with external buffers (TORE), addressing the size of needed space for temporary object relocation ("running buffers"). This study shows that finding the maximum running buffers (MRB) is NP-hard and that MRB can grow unbounded with an increasing number of objects, even with uniform shapes. Exact algorithms developed for both labeled and unlabeled settings can scale to over 100 objects. The thesis further extends the TORE algorithms to tabletop rearrangement with internal buffers (TORI), where all temporary object placements need to be inside the workspace.


Adaptive Dual-Headway Unicycle Pose Control and Motion Prediction for Optimal Sampling-Based Feedback Motion Planning

İşleyen, Aykut, Kadu, Abhidnya, van de Molengraft, René, Arslan, Ömür

arXiv.org Artificial Intelligence

Safe, smooth, and optimal motion planning for nonholonomically constrained mobile robots and autonomous vehicles is essential for achieving reliable, seamless, and efficient autonomy in logistics, mobility, and service industries. In many such application settings, nonholonomic robots, like unicycles with restricted motion, require precise planning and control of both translational and orientational motion to approach specific locations in a designated orientation, such as for approaching changing, parking, and loading areas. In this paper, we introduce a new dual-headway unicycle pose control method by leveraging an adaptively placed headway point in front of the unicycle pose and a tailway point behind the goal pose. In summary, the unicycle robot continuously follows its headway point, which chases the tailway point of the goal pose and the asymptotic motion of the tailway point towards the goal position guides the unicycle robot to approach the goal location with the correct orientation. The simple and intuitive geometric construction of dual-headway unicycle pose control enables an explicit convex feedback motion prediction bound on the closed-loop unicycle motion trajectory for fast and accurate safety verification. We present an application of dual-headway unicycle control for optimal sampling-based motion planning around obstacles. In numerical simulations, we show that optimal unicycle motion planning using dual-headway translation and orientation distances significantly outperforms Euclidean translation and cosine orientation distances in generating smooth motion with minimal travel and turning effort.


Bimanual In-hand Manipulation using Dual Limit Surfaces

Dang, An, Lorenz, James, Yi, Xili, Fazeli, Nima

arXiv.org Artificial Intelligence

In-hand object manipulation is an important capability for dexterous manipulation. In this paper, we introduce a modeling and planning framework for in-hand object reconfiguration, focusing on frictional patch contacts between the robot's palms (or fingers) and the object. Our approach leverages two cooperative patch contacts on either side of the object to iteratively reposition it within the robot's grasp by alternating between sliding and sticking motions. Unlike previous methods that rely on single-point contacts or restrictive assumptions on contact dynamics, our framework models the complex interaction of dual frictional patches, allowing for greater control over object motion. We develop a planning algorithm that computes feasible motions to reorient and re-grasp objects without causing unintended slippage. We demonstrate the effectiveness of our approach in simulation and real-world experiments, showing significant improvements in object stability and pose accuracy across various object geometries.


DAP: Diffusion-based Affordance Prediction for Multi-modality Storage

Chang, Haonan, Boyalakuntla, Kowndinya, Liu, Yuhan, Zhang, Xinyu, Schramm, Liam, Boularias, Abdeslam

arXiv.org Artificial Intelligence

Solving storage problem: where objects must be accurately placed into containers with precise orientations and positions, presents a distinct challenge that extends beyond traditional rearrangement tasks. These challenges are primarily due to the need for fine-grained 6D manipulation and the inherent multi-modality of solution spaces, where multiple viable goal configurations exist for the same storage container. We present a novel Diffusion-based Affordance Prediction (DAP) pipeline for the multi-modal object storage problem. DAP leverages a two-step approach, initially identifying a placeable region on the container and then precisely computing the relative pose between the object and that region. Existing methods either struggle with multi-modality issues or computation-intensive training. Our experiments demonstrate DAP's superior performance and training efficiency over the current state-of-the-art RPDiff, achieving remarkable results on the RPDiff benchmark. Additionally, our experiments showcase DAP's data efficiency in real-world applications, an advancement over existing simulation-driven approaches. Our contribution fills a gap in robotic manipulation research by offering a solution that is both computationally efficient and capable of handling real-world variability. Code and supplementary material can be found at: https://github.com/changhaonan/DPS.git.